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Summary 

The crystal structure of a conserved domain of non- 
structural protein 3 (nsP3) from severe acute respira¬ 
tory syndrome coronavirus (SARS-CoV) has been 
solved by single-wavelength anomalous dispersion 
to 1.4 A resolution. The structure of this “X” domain, 
seen in many single-stranded RNA viruses, reveals a 
three-layered a/|3/a core with a macro-H2A-like fold. 
The putative active site is a solvent-exposed cleft that 
is conserved in its three structural homologs, yeast 
Ymx7, Archeoglobus fulgidus AF1521, and Er58 from 
E. coli. Its sequence is similar to yeast YBR022W 
(also known as PoalP), a known phosphatase that 
acts on ADP-ribose-1 "-phosphate (Appr-1"-p). The 
SARS nsP3 domain readily removes the 1" phosphate 
group from Appr-1"-p in in vitro assays, confirming 
its phosphatase activity. Sequence and structure com¬ 
parison of all known macro-H2A domains combined 
with available functional data suggests that proteins 
of this superfamily form an emerging group of nucleo¬ 
tide phosphatases that dephosphorylate Appr-1"-p. 

Introduction 

Severe acute respiratory syndrome (SARS) emerged as 
the first severe and readily transmissible new disease of 
the 21st century and caused 8000 infections and more 
than 800 deaths in 2003 (Groneberg et al., 2003). The 
causative organism is a new coronavirus (SARS-CoV) 
that is distantly related to group II coronaviruses. The 
virus has a single-stranded RNA genome of ~29.7 kb 
that encodes at least 14 putative open reading frames 
(ORFs) (Peiris et al., 2003; Drosten et al., 2003) (Figure 
1 A). Two-thirds of the viral genome at the 5' end is orga¬ 
nized as a single highly conserved ORF, known as ORF- 
1 a/lab, that is translated into two large polyproteins, 
ppla and pplab (Ksiazek et al., 2003). Translation of 
pplab involves ribosomal frameshifting, a feature also 
seen in many other coronaviruses (Baranov et al., 
2005; Snijder et al., 2003). Termed as the “replicase poly¬ 
proteins,” ppla and pplab are subsequently posttrans- 
lationally cleaved by two virus-encoded proteases, the 
3C-like protease (the main protease or 3CL-Pro) and 
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the papain-like cysteine protease (PLP), into 16 mature 
protein products (Snijder et al., 2003) (Figure 1 A). These 
“nonstructural proteins” or nsPs (nsP1-nsP16) form a 
giant replicase complex that participates in numerous 
functions during viral infection, such as replication of 
the RNA genome, processing of subgenomic RNA, and 
packaging of newly budding virions (Ziebuhr et al., 
2001 ). 

The third of these nonstructural proteins, nsP3, is 
a large multidomain protein of 1922 amino acids that 
spans residues 819-2740 of ppla (NP_828862; gi: 
34555776) (Thiel et al., 2003). Mature nsP3 results due 
to proteolytic cleavage of ppla at two sites (sisGA^g 
and 274 oGK 27 4 i) by the papain-like proteinase (Thiel 
et al., 2003). nsP3 has conserved sequence motifs of 
six independent domains: (1) an N-terminal glu-rich 
acidic domain; (2) an X domain with predicted Appr-1 "-p 
processing activity; (3) a SUD domain (SARS-specific 
unique domain); (4) a peptidase C-16 domain that con¬ 
tains the papain-like protease; (5) a transmembrane do¬ 
main; and (6) the Y domain (Figure 1 A). 

The conserved X domain of nsP3 has been predicted 
to house a putative adenosine diphosphate ribose 1" 
phosphatase (ADRP) function and is annotated in do¬ 
main classification databases such as SMART (Letunic 
et al., 2004) and Interpro (Mulder et al., 2003) as a mem¬ 
ber of the Al pp superfamily that includes more than 300 
proteins from archaea, bacteria, eukaryotes, and single- 
stranded, positive-sense RNA viruses. Structures of 
three homologs from this superfamily, yeast Ymx7, 
E. coli Er58, and a conserved C-terminal domain of 
nonhistone macro-H2A from Archeoglobus fulgidus 
(AF1521), show that they all adopt a generic macro- 
H2A-like fold with minor variations. While the function 
of some members of this superfamily, like the human 
poly-(ADP-ribose) polymerase, has been experimentally 
characterized (D’Amours et al., 1999), that of many 
others is yet to be determined. 

As a part of an integrated program to study emerging 
infectious diseases, we have undertaken the structural 
and functional characterization of all of the major protein 
products in SARS-CoV by using a multipronged ap¬ 
proach. This included a detailed bioinformatics analysis 
of the SARS-Tor2 genome by using sensitive profile- 
based methods like PSI-BLAST (Altschul et al., 1997), 
FFAS (Rychlewski et al., 2000), and HMMER (Eddy, 
1996) for detection of remote homologs, identification 
of domain boundaries in multidomain proteins, and 
functional annotation. Based on this analysis, 173 con¬ 
structs that cover the entire proteome were designed 
and cloned into vectors for overexpression in E. coli 
and baculovirus systems (http://sars.scripps.edu). 

In this study, we present the first of the crystal struc¬ 
tures from this effort, that of the highly conserved 
putative phosphatase domain of nsP3. To our knowl¬ 
edge, this is the first crystal structure of this domain 
from positive-sense, single-stranded RNA viruses. It 
reveals a close structural relationship with prototypical 
macro-H2A-like fold proteins. One of its sequence 
homologs, Poal p (YBR022) from Saccharomyces 
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Figure 1. Genomic Location of the X Domain of SARS nsP3 

(A) Schematic of the SARS genome and proteome showing the location of SARS nsp3 and its putative functional domains. The predicted func¬ 
tions of different nsPs of ORF1 a/ORFI ab are highlighted along with the structural and accessory genes. Abbreviations used are: P65, P65 protein 
homolog of murine hepatitis virus; 3CL-Pro, SARS main protease; RdRp, RNA-dependent RNA polymerase; Hel, Zn 2+ -dependent helicase; ExoN, 
homolog of exonuclease; NendoU, uridylate-specific endonuclease; 2 / -0-MT, methyltransferase; E, small envelope glycoprotein; M, matrix; N, 
nucleocapsid; ADRP, ADP-ribose-1 "-phosphate phosphatase; SUD, SARS-specific unique domain; PL2-Pro, papain-like protease; TM, trans¬ 
membrane domain. (Figure modified from Snijder et al., 2003). 

(B) Sequence alignment of known macrodomains. Group I: ADRP domain found in nsPs of different coronaviruses: Sars_nsP3, SARS corona- 
virus-Tor2; R1AB_CVBM, bovine coronavirus; AAR01012, human coronavirus OC43; Q66WN5, murine hepatitis virus; Q6Q1S3, human corona- 
virus NL63; R1AB_PEDV7, porcine epidemic diarrhea virus; R1 AB_CVPPU, transmissible gastroenteritis virus; R1ABJBVBC, avian infectious 
bronchitis virus. Group II: ADRP homologs from other related viruses: Q6X2U4, rubella virus; 090370, Igbo Ora virus; Q8JUX6, Chikungunya vi¬ 
rus; 010380, Semliki forest virus; Q8QHM4, Mayaro virus; Q9JGK9 Ross River virus; P87515, Barmah forest virus; Q87644, Sindbis virus; Q86924, 
Aura virus; Q88791 , Western equine encephalomyelitis virus; Q66580, Eastern equine encephalitis virus; 090163, Venezuelan equine encepha¬ 
litis virus; Q8QL53, sleeping disease virus; 090368, 0’nyong-nyong virus; Q8JJX1 , salmon pancreatic disease virus; and Group III: macrodomain 
hypothetical proteins of the Al pp superfamily: gil20178242, E. coli ymdB; gil20178260, Deinococcus radiodurans ; gil20178146, Ralstonia sola- 
nacearurrr, gil20178157, Mesorhizobium loti; gil20178167, Pseudomonas aeruginosa; gil20090472, Methanosarcina acetivorans; gil20178156, 
Thermoplasma volcanium; gill 6082127, Thermoplasma acidophilum; gill 9705253, Fusobacterium nucleatum; gil20178176, Pyrococcus abyssi; 
gil20178181 , Pyrococcus horikoshii; gill 1499116 , Archaeoglobus fulgidus (AF1521); gil20178224, Pyrobaculum aerophilum; gil20178255, Sulfo- 
lobus solfataricus; gil20178177, Thermotoga maritima; gil20094386, Methanopyrus kandleri; gil20178182, Aquifex aeolicus; gil20178236, Aero- 
pyrum pernix; gil20178237, Mycobacterium tuberculosis. The two structurally characterized members of group III are highlighted in red. 


cerevisiae, was recently functionally characterized as 
a highly specific phosphatase that removes the Y 
phosphate group of ADP-ribose-1 "-phosphate (Appr- 
1"-p) in the latter half of the tRNA splicing pathway 
in yeast (Shull et al., 2005), hinting at a similar substrate 
specificity for SARS ADRP. Using an in vitro assay, we 
experimentally validate that this ADRP domain of SARS 
nsP3 is indeed a phosphatase that removes the ter¬ 
minal Y phosphate from Appr-1"-p. To our knowl¬ 
edge, these results, combined with recently elucidated 
structures of two hypothetical proteins, suggest that a 


majority of macro-H2A fold members form a new fam¬ 
ily of nucleotide phosphatases. 

Results and Discussion 

Description of the ADRP Domain of SARS nsP3 
The cloned insert contains 182 residues from nsP3 of 
SARS-CoV-Tor2 and has a monomer molecular weight 
of 19,523 Da and a pi of 6.9. The final structural model re- 

o 

fined against crystallographic data to 1.4 A resolution 
has four subunits in the asymmetric unit in very similar 
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Table 1. Data Collection and Refinement Statistics 


SeMet (Peak 1) Native 

Data Collection 


Space group 

P2i2,2i 

P212.2! 

Unit cell parameters 

a = 76.920 A, 

a = 76.495 A, 


b = 81.224 A, 

b = 81.585 A, 


c = 125.695 A 

c = 125.465 A 

Wavelength (A) 

0.97941 

0.9794 

Resolution range (A) 

50.0-2.2 

50.0-1.40 

Number of observations 

1,909,845 

2,836,583 

Number of unique 

41,023 

145,609 

reflections 



Completeness (%) 

100.0 (99.8) 

99.20 (98.9) 

Redundancy 

7.6 

3.7 

Mean l/c> (1) 

30.13(6.04) 

32.93 (2.82) 

Rsym On 1 

0.091 (0.507) 

0.043 (0.590) 

Highest resolution shell (A) 

2.24-2.20 

1.42-1.40 

Figure of merit after 

0.54 


RESOLVE 



Refinement 

R b 

rl work 

16.4 (22.2) 


Rfree" 

19.0 (25.0) 


Protein atoms 

5,485 (18.1) 


(average B factor) 



Solvent atoms 

950 (36.91) 


(average B factor) 



Hetero atoms 

72 (35.28) 


(average B factor) 



Rmsd bond length (A) 

0.018 


Rmsd bond angle (°) 

1.69 


Ramachandran statistics 



Most favored (%) 

90.8 


Additionally allowed (%) 

8.4 


Generously allowed (%) 

0.8 



a R S ym = ShwKEjl Ij - </>l)/2yl/yl]. 

b Rwork = 2 h ki IFo - F c l/E hk |IF 0 l, where F 0 and F c are the observed and 
calculated structure factors, respectively. 

c 5% of the reflections (7,683 reflections) was used in the calculation 

Of Rfree- 

Values in parentheses are for data corresponding to the outermost 
shell. 


conformations (rmsds < 0.4 A for 166 Ca atoms). We do 
not observe electron density for a few residues at the C 
terminus of each of the four monomers. These include 5 
residues in chain D, 15 residues in the chain A, and 9 re¬ 
sidues each in the B and C monomers. The final refine¬ 
ment statistics and stereochemical parameters of the 
structure are listed in Table 1 . Overall, each subunit con¬ 
sists of eight p strands and six a helices (Figure 2A). 
Strands 2-8 form a central seven-stranded p sheet that 
has a strand order of 2387465. The outermost strands 
on either side are antiparallel to the rest. The six helices 
straddle the central p sheet to form a three-layered a/p/ 
a topology. Two of the subunits in the asymmetric unit 
form a loosely packed head-to-head dimer (Figure 2B). 
A short loop connecting strand 6 and helix H4 is involved 
in weak interfacial contacts with the conserved Gly-rich 
segment of the other monomer. The interface is fairly 
small at —870 A 2 (435 A 2 per monomer) and predomi¬ 
nantly nonpolar (60%). Residues that form the putative 
active site lie close to the dimer interface. The enzyme 
elutes as a homodimer in gel-filtration studies (data 
not shown), indicating that the physiologically relevant 
form of this protein may be dimeric. 



Figure 2. Structure of SARS ADRP 

(A) Ribbon representation of the SARS nsP3 ADRP monomer. The 
two glycine-rich loops are shown in yellow. Secondary structures 
are colored from blue (N) to red (C terminus). Helices are numbered 
H1-H6, and (3 strands are numbered from 1 to 8. 

(B) The SARS ADRP dimer observed between the B and D subunits 
in the asymmetric unit. The four conserved segments are colored 
red in each subunit; the conserved histidines and asparagines at 
the active site are shown as ball-and-sticks. 


SARS ADRP Belongs to the Macro-H2A Fold 
Comparison of one of the chains of SARS ADRP against 
all known structures in the PDB by using DALI (Holm and 
Sander, 1993) revealed the presence of two structural 
homologs: a hypothetical protein from Archeoglobus 
fulgidus, AF1521 (PDB code: 1HJZ; z score of 18.1; 

o 

rmsd of 2.4 A for 152 superimposed Ca atoms; pairs 
with a z score > 3.0 are considered structurally similar), 
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Figure 3. Fold Classification of the SARS ADRP Domain 

(A-F) (A) Bovine Lens Leu-aminopeptidase (1 LAM). (B) E. coli PepA (1GYT). (C) yeast Appr phosphatase homolog (1TXZ). (D) E. coli hypothetical 
protein Er58 (1SPV). (E) Archeoglobus fuldiges AF1521. (F) ADRP domain of SARS nsp3. The helices are colored cyan, and the strands are col¬ 
ored yellow in the core macrodomain. The inserted secondary structural elements that are not part of the main core are highlighted in red. The 
circular permutation seen in yeast Ymx7 (1TXZ) is marked in green, and the C-terminal helical domain is shown in white. 


and the N-terminal domain of bovine lens leucine amino- 

o 

peptidase (PDB code: 1 LAM; z score of 8.0; rmsd of 2.6 A 
for 119 Ca atoms; Strater and Lipscomb, 1995). Both 
structures are members of the “macrodomain-like” 
fold as defined in the SCOP database (Murzin et al., 
1995). This fold includes two other structural homologs 
from E. coli, aminopeptidase A (PepA) and a hypothetical 
protein ymbD (Northeast Structural Genomics Consor¬ 
tium target Er58; PDB code: 1 SPV). The topological con¬ 
nectivity of the secondary structural elements of these 
four proteins along with the ADRP domain of SARS 
nsP3 is shown in a similar orientation in Figure 3 (A, B, 

D, E, and F). All of them share the same three-layered 
a/p/a core, with minor variations. They have a mixed 
p sheet of six strands with strand order 165243. The first 
strand and the first helix are absent in bovine lens leucine 
aminopeptidase (BILAP, Figure 3A). AF1521 has two in¬ 
sertions to this core, a p strand inserted at the N terminus 
and an a helix between strands 3 and 4 (Figure 3E). The 
SARS ADRP domain has two p strands inserted at the 
N terminus, one of which forms part of the central p sheet 
(Figure 3F). The sixth protein structure in Figure 3, Ymx7 
from yeast (PDB code: 1TXZ), is a member of this fold 
and has a circular permutation (Figure 3C). The first 
strand and the first helix of this protein occupy structural 
positions that correspond to the last p strand and the 
C-terminal helix (H6) of a canonical macro-like fold. 

Function of Macro-like Fold Proteins 

BILAP is an exopeptidase that cleaves amino acids from 

the N terminus of polypeptides (Burley et al., 1990). 

E. coli PepA is a DNA binding protein that is involved 
in Xer site-specific recombination and transcriptional 
control of the carAB operon (Strater et al., 1999). Al¬ 
though both share significant similarities in sequence 


(31 % identity) and structure (both are homohexameric 
with a dinuclear Zn 2+ in their active site), they have 
widely different functions. The peptidase activity is not 
needed by PepA to function during Xer-specific recom¬ 
bination (McCulloch et al., 1994) or during repression of 
carAB transcription (Charlier et al., 1995). On the other 
hand, bILAP does not have any demonstrated DNA 
binding function. 

AF1521 is a stand-alone macrodomain from Archeo¬ 
globus fulgidus and is a close homolog of the C-terminal 
nonhistone domain of the largest variant of histone H2A 
(Pehrson and Fried, 1992; Pehrson and Fuji, 1998). It is 
evolutionarily related to P loop-containing nucleotide tri¬ 
phosphate hydrolases. The structure has been solved in 
its apo form (Allen et al., 2003) and in complex with two 
ligands, Mg 2+ -ADP (PDB code: 2BFR, unpublished) 
and ADP-ribose (PDB code: 2BFQ; unpublished). Yeast 
Ymx7 is a conserved hypothetical protein from the 
ADH3-RCA1 intergenic region. Although its function has 
not been experimentally demonstrated, it has been an¬ 
notated as an ADP-ribose-T'-monophosphatase (ADRP) 
based on its sequence similarity to known ADRPs (Ku- 
maran et al., 2005). Finally, the structure of a conserved 
hypothetical protein, Er58, from E. coli that was solved 
by the Northeast Structural Genomics Consortium (PDB 
code: 1SPV, unpublished) reveals a canonical macro¬ 
like fold. Its function remains unknown. 

It would thus appear that the five known members of 
this fold fall into two broad functional groups, one con¬ 
taining BILAP and E. coli PepA and the second contain¬ 
ing the other three hypothetical proteins. All members of 
the second group not only share a similar global archi¬ 
tecture, but also share conserved active site features. 
Although all of these proteins can be picked up by 
PSI-BLAST by using SARS ADRP as a query template, 
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Figure 4. Structure Comparison of SARS 
ADRP with YMX7 and AF1521 

(A) Surface of SARS ADRP showing the distri¬ 
bution of electrostatic potential. The ADP- 
ribose bound complexes of AF1521 (PDB 
code: 2BFQ; 1.5 A) and Yeast Ymx7 (PDB 

o 

code: 1TXZ; 2.0 A) are shown for comparison. 
The bound ligands in the two structures are 
shown as ball-and-sticks. 

(B) Superposition of the three structures 
(SARS ADRP is in green, AF1521 is in cyan, 
and Ymx7 is in purple). The bound ADP and 
ADP-ribose are shown as ball-and-sticks. 
The residues from SARS ADRP that are pro¬ 
posed to interact with the ligand are shown 
in ball-and-sticks, and the putative interac¬ 
tions are highlighted as dotted lines. 

(C) Structure-based sequence alignment of 
SARS ADRP with its four structure homologs: 
AF1521 ADP-ribose complex (2BFQ); E. coli 
hypothetical protein Er58 (1SPV); E. coli 
PepA (1GYT); and yeast Ymx7 ADP-ribose 
complex (1TXZ). Helical regions are in cyan, 
and 3 strands are in yellow. Regions that 
can be confidently aligned are in capital let¬ 
ters, and those that align poorly or do not 
align at all are in small letters. The four con¬ 
served segments are highlighted in rectangu¬ 
lar blocks. The circular permutation of yeast 
Ymx7 is marked in red. 
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SARS_nsP3 184 
AF_1S21 1 
Ecoli_ErS8 1 
Ecoli_pepA 1 
Yeast Ymx7 45 
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it is clear that the SARS domain is closer to phospha¬ 
tases of the second group. 

The Putative Active Site 

The ADRP domain of SARS nsP3 has a deep solvent- 
exposed cleft on the protein surface that is very similar 
to that seen in AF1521, yeast Ymx7, and E. coli Er58. Sur¬ 
face representations showing the distribution of electro¬ 
static potential on SARS ADRP and on the structures of 
ligand bound forms of AF1521 and yeast Ymx7 (shown in 
Figure 4A) clearly indicate that the putative active site 
cleft is similar in the three structures. Repeated soaking 
and cocrystallization attempts failed to yield cocrystals 
of SARS ADRP with ADP-ribose, perhaps because the 
active site is occluded by the dimer interface. However, 


the availability of the product (ADP-ribose) bound forms 
of AF1521 and yeast Ymx7 facilitates a detailed struc¬ 
ture comparison of these two homologs with SARS 
ADRP. Many of the residues that interact with the ligand 
are conserved in the three structures. A view of the pro¬ 
posed active site of SARS ADRP along with the superim¬ 
posed structures of AF1521 and yeast Ymx7 are shown 
in Figure 4B, highlighting the interactions that are likely 
between residues of the protein with the ligand. A 
structure-based sequence alignment of SARS ADRP 
with four of its structural homologs is shown in Figure 
4C. The BILAP sequence was omitted in this alignment. 

Most macro-like fold proteins, including the ADRP do¬ 
main from RNA viruses, show the presence of four con¬ 
served stretches of residues (Figure 1 B). The first motif 
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“XXNAAN,” where XX are any two hydrophobic amino 
acids, is highly conserved across the superfamily. This 
is immediately followed by a Gly-rich region (GGGVAG) 
that is reminiscent of the Walker A motif seen in many 
P loop nucleotide hydrolases (Walker et al., 1982). A no¬ 
table feature is that the invariant lysine of the Walker A 
motif is an arginine in some coronaviruses and is absent 
in others (Figure 1 B). The third stretch, “XWGP,” where 
X is often a conserved histidine, is in the middle of the 
polypeptide. Finally, a stretch of 4 residues mainly con¬ 
sisting of small hydrophilic amino acids and a glycine is 
present near the C terminus of the polypeptide chain 
(Figure 1 B). Residues from the third motif occupy struc¬ 
turally similar positions to the Walker B motif in classical 
P loop hydrolases. These four regions line the putative 
active site of the ADRP domain of the SARS nsP3 struc¬ 
ture. The first motif forms part of the fourth p strand (Fig¬ 
ure 4C), while the Gly-rich segment is part of a loop that 
connects strand 4 with the second helix. The third motif 
connects strand 6 to helix H4. 

Description of the Putative Active Site of SARS ADRP 
The active site can be broadly divided into the adenine 
binding cleft, the first ribose, and the bisphosphate 
binding site, followed by the terminal ribose-phosphate 
binding pocket that is the center of catalysis. As antici¬ 
pated, the adenine binding pocket consists of largely 
hydrophobic residues. It is less conserved in the three 
structures than the other two pockets. In SARS ADRP, 
residues Ile23, Ala52, Prol 25, and Alai 54 form the walls 
of the putative adenine binding cleft. In the AF1521- 
ADP-ribose complex structure, the adenine ring is stabi¬ 
lized by two hydrogen bonds. One of the side chain 
carbonyl oxygens of Asp20 is within hydrogen bonding 
distance to the N1 and N6 of the adenine rings. In 
SARS ADRP, the equivalent residue is Asp22 and is 
likely to play a similar role. The other hydrogen bond is 
between the N7 and the backbone carbonyl group of 
Gly42. The binding site of the first ribose ring is a highly 
hydrated solvent-exposed cleft in which multiple water- 
mediated interactions are seen between the ribose and 
residues Asp177 and Seri 80 in AF1521. In SARS 
ADRP, residues Asnl 56 and Aspl 57 that lie in a loop be¬ 
tween strand 6 and helix H6 are likely to stabilize the ri¬ 
bose by forming similar polar interactions. 

The a and (3 phosphates of the ADP moiety are mainly 
stabilized by backbone hydrogen bonds with the two 
Gly-rich motifs in a manner similar to that observed in 
P loop hydrolases. While the a phosphate is stabilized 
by hydrogen bonds with the backbone of the three con¬ 
secutive glycines of motif II, the p phosphate interacts 
with the amides of Gly130 and llel 31. This loop also 
helps to stabilize the p phosphate, as the Walker B motif 
does in P loop hydrolases. 

The terminal ribose moiety of the ADP-ribose-1"- 
phosphate lies on a cleft that is approximately perpen¬ 
dicular to the adenine binding pocket. This is the puta¬ 
tive site of catalysis. The side chain amide nitrogen of 
the conserved asparagine (N80) forms hydrogen bonds 
with the 03 and 04 of the ribose in the yeast Ymx7-ADP- 
ribose complex (Figure 4D). This residue from the first 
motif superimposes almost perfectly with Asn40 in 
SARS ADRP and Asn34 in AF1521 and is invariant 
among all macro-like fold members (Figure 1 B). Asp90 


and His145, two residues that have been implicated in 
catalysis in yeast Ymx7, lie embedded underneath the 
loop that connects strand 4 and helix H2 (Figure 4D). 

Enzymatic Activity of SARS ADRP 
Given the close similarity between the three structures 
(SARS ADRP, AF1521, and Ymx7) and the similarity 
at the sequence level between SARS ADRP and yeast 
Poalp (an enzyme with demonstrated ADRP activity), 
it was apparent that their function was likely to be similar 
as well. We therefore tested the ability of SARS ADRP to 
dephosphorylate Appr-1 "-p in vitro. We employed a ge¬ 
neric assay that monitors the liberation of inorganic 
phosphate in solution during catalysis (Webb, 1992). 
The results are shown in Figure 5. We observed a sus¬ 
tained release of phosphate after the addition of in¬ 
creasing amounts of the substrate (Appr-1 "-p) to the as¬ 
say containing fixed amounts of the enzyme (Figure 5A). 
Upon overnight incubation, the amount of phosphate re¬ 
leased was proportional to the amount of the substrate 
added, suggesting that SARS ADRP indeed had the abil¬ 
ity to dephosphorylate Appr-1 "-p into ADP-ribose and 
inorganic phosphate (Figure 5B). Further kinetic charac¬ 
terization of the enzyme shows that the dephosphoryla¬ 
tion is relatively low, with a K M of 52.7 ± 8.2 jiM and a k cat 
of 5.19 min -1 . While the observed catalytic efficiency of 
this enzyme is not very high, it is comparable to the val¬ 
ues reported for Poalp. In a TLC-based assay with ra¬ 
diolabeled substrates, both Poalp and Hal2p, a known 
3' phosphatase of 5',3'-pAp, showed similar low cata¬ 
lytic yields (K M = 2.8 (iM; k cat = 1.7 min -1 for Poalp), 
but both enzymes were highly specific for the Appr-1 "-p 
substrate (Shull et al., 2005). A few well-known phos¬ 
phatases whose activity has been monitored by the 
same assay (Wang et al., 1995) also show similar levels 
of activity. 

There might be multiple reasons for the low activity 
levels detected in these assays. It might be intrinsic 
for this class of enzymes, as seen in the case of yeast 
Poal p. Moreover, the released product, ADP-ribose, is 
a competitive inhibitor of this reaction (Shull et al., 
2005). The proposed active site is occluded at the dimer 
interface in the crystal structure (Figure 2B) and might be 
hindering access to the substrate in our in vitro assay. 
The in vivo scenario might be different, where enzyme 
activity might be regulated by other components of the 
replicase complex. 

Catalytic Mechanism 

Yeast Ymx7, one of the structural homologs of SARS 
ADRP, has been proposed to perform the same reac¬ 
tion. It is a remote homolog of the macrodomain super¬ 
family, albeit with a circular permutation (Figure 3C). It 
also has a different set of catalytic residues at the active 
site when compared to classical macrodomains. Based 
on the structure of the ADP-ribose bound Ymx7, Ku- 
maran et al. (2005) have speculated on a catalytic mech¬ 
anism that involves three residues: Asp90, His145, and 
Asn80. While the histidine and asparagine residues are 
conserved in all three of the structures, Ymx7, AF1521, 
and SARS nsP3, the equivalent position of Asp90 of 
Ymx7 is an alanine in the other two (Ala50 in ADRP and 
Ala44 in AF1521-ADP-ribose complex; Figure 4D). This 
would imply that while the proposed mechanism might 
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Figure 5. Enzymatic Activity of the SARS ADRP Domain 

(A) Continuous release of inorganic phosphate monitored by an in¬ 
crease in absorbance for the initial 20 min of the reaction in two sub¬ 
strate concentrations. 

(B) Amount of phosphate released at three different substrate con¬ 
centrations after overnight incubation with 5 fiM enzyme when incu¬ 
bated with three different concentrations of the substrate. The error 
bars correspond to the standard deviations of three independent 
measurements at each concentration. 

(C) Michaelis-Menton kinetics (rate plot) showing the activity of the 
enzyme at different concentrations of the substrate (Appr-1"-p) 
and the obtained values of K M and k cat . 


be correct in the case of yeast Ymx7, it cannot be the 
mode of dephosphorylation for either ADRP or AF1521. 
Instead, these two enzymes have a histidine (His45 in 
ADRP and His39 in AF1521) residue that might be in 
close proximity to the terminal 1 " phosphate of the sub¬ 
strate and might therefore be involved in catalysis. Alter¬ 
nately, it might be speculated that the role of the pre¬ 
dominant nucleophile might be played by one of the 
aspartates or glutamates from the loop 10 iNAGEDIQ 10 7 
in SARS and other coronaviral ADRPs. This loop shows 
large conformational changes in the apo and ligand 
bound forms of AF1521 and Ymx7 and is rich in acidic 
residues (Figure IB). Further mechanistic studies, co¬ 
crystallization experiments, and mutagenesis of these 
residues that are implicated here are necessary to eluci¬ 
date the catalytic mechanism of this enzyme. Despite re¬ 
peated attempts at soaking and cocrystallization, we 
have not been able to observe density of the bound sub¬ 
strate. A possible reason might be the limited accessibil¬ 
ity of the active site, as it is buried in the dimer interface 
during crystal packing (Figure 2B). 

Discussion 

The demonstrated function of SARS ADRP as an Appr- 
1"-p phosphatase has important functional implications 
in the SARS life cycle. While the manner in which the vi¬ 
rus infects the human host is fairly well characterized, 
many of the postinfection events that occur in the intra¬ 
cellular milieu of the host remain poorly understood. The 
infection process begins when the spike glycoprotein 
present on the viral coat recognizes one of two recep¬ 
tors present on the human cell surface: angiotensin¬ 
converting enzyme-2 (ACE-2) (Li et al., 2003; Kuhn 
et al., 2004) or a C-type lectin known as L-SIGN or 
CD209L (Jeffers et al., 2004). In arteri- and coronavi- 
ruses, an early postinfection event is the transcription 
of a nested set of subgenomic RNA (Lai and Holmes, 
2001; Thiel et al., 2003). The resulting mRNAs contain 
a short 5'-terminal “leader” sequence derived from the 
5' end of the genome. The fusion of the two noncontigu¬ 
ous RNA segments is achieved by a discontinuous step 
in the synthesis of the minus strand and involves tran¬ 
scription regulatory sequences or TRSs (Thiel et al., 
2003; Pasternak et al., 2001). Very few experimental de¬ 
tails exist on the processing, maturation, and subse¬ 
quent roles of these important molecules in the viral 
life cycle. 

This process has parallels in the eukaryotic tRNA 
splicing pathway that has been well studied in yeast 
and plants (Culver et al., 1994; Phizicky and Greer, 1993; 
Peebles et al., 1983). In these organisms, pre-tRNA 
splicing is initiated by cleavage at the splice site by an 
endonuclease. The resulting tRNA halves are then li¬ 
gated to yield mature tRNA that retains the 2! phospho- 
monoester group at the splice site (Phizicky and Greer, 
1993; McCraith and Phizicky, 1990). Using NAD as an 
acceptor, a phosphotransferase removes the 2! phos¬ 
phate to yield ADP-ribose-1"-2" cyclic phosphate or 
Appr>p (Culver et al., 1993). A cyclic phosphodiesterase 
hydrolyzes Appr>p to yield Appr-T'-p (Culver et al., 
1994; Martzen et al., 1999). The terminal step in this 
pathway is a phosphatase-catalyzed conversion of 
Appr-1"-p into ADP-ribose and inorganic phosphate, 
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which are channeled through various cellular metabolic 
pathways. 

There is increasing evidence that the Nendoll (nsP15) 
in SARS functions in a manner analogous to the endonu¬ 
clease of the tRNA splicing pathway. It is a Mn 2+ -depen- 
dent enzyme that also releases products with 2"-3" 
cyclic phosphorylated ends (Ivanov et al., 2004; Bhard- 
waj et al., 2004). While work on this enzyme was in prog¬ 
ress, the eukaryotic homolog of Nendoll, the Xendoll 
from Xenopus laevis, was functionally characterized, 
highlighting the existence of an orthologous pathway 
in higher eukaryotes (Gioia et al., 2005). Details of this 
process are only beginning to emerge. It is noteworthy 
that, although orthologs of cyclic phosphodiesterase 
(CPD), the enzyme that catalyzes the previous step in 
the tRNA splicing pathway, has been found in group II 
coronaviruses along with toro- and rotaviruses, it is ab¬ 
sent in the SARS virus (Thiel et al., 2003). 

SARS Nendoll specifically recognizes uridylate bases 
at GUU sites of RNA (Ivanov et al., 2004). The virus pro¬ 
tects its own RNA by methylating its 5' termini CAP by 
using an Ado-Met-dependent RNA methyltransferase 
(von Grotthuss et al., 2003), a process that is imperative 
during coronaviral replication and an active area of ther¬ 
apeutic intervention (Bach et al., 1995; Vlot et al., 2002). 
The possibility that SARS ADRP, Nendoll, and the meth¬ 
yltransferase might be acting in concert and might 
therefore be functionally linked has been the subject of 
previous speculation (Thiel et al., 2003). The precise 
role of these three enzymes along with 3'-5' exonuclease 
and RNA polymerase and their possible interaction with 
each other as integral components of the replicase com¬ 
plex remain poorly understood. It is becoming increas¬ 
ingly clear that coronaviruses not only differ from other 
related viruses in having a bigger genome size, but 
they also have an uncanny similarity with DNA-based 
life forms in their ability to maintain, synthesize, and reg¬ 
ulate the proteomic and genomic components of their 
life cycle in hitherto unforeseen ways. The work pre¬ 
sented here further reinforces this view and hints at 
the possibility of a tRNA splicing pathway-like process 
by which the generation of subgenomic RNA and its 
subsequent translation to yield mature viral proteins is 
regulated. 

Orthologs of SARS ADRP are found embedded in non- 
structural proteins of many related ssRNA viruses, espe¬ 
cially in alphaviruses of togaviridae (group II of Figure 
1 B). These include, among others, nsP2 of Sindbis virus 
(Strauss et al., 1984), nsP3 of O’nyong-nyong virus (Lan- 
ciotti et al., 1998), nsP3 of Ross River virus (Shirako and 
Yamaguchi, 2000), PI 50 of the lone nsP in Rubella virus 
(Zheng et al., 2003), nsP3 of Mayaro virus (Anderson 
et al., 1954), and nsP3 of Semliki Forest virus (Tuittila 
et al., 2000). Many of these viruses have a greatly re¬ 
duced genome size (~10 kb), with only about 4-5 
ORFs. On the other hand, the five known human corona¬ 
viruses, HCoV-OC43, HCoV-229E, HCoV-NL63, HCoV- 
HKU1, and SARS-CoV, have genome sizes of 27-32 
kb. The occurrence of this phosphatase as part of their 
replicative machinery underscores the importance of 
this enzyme in their life cycle and hints at a similar mech¬ 
anism by which their genomic and subgenomic RNA 
could be processed inside their respective host cells. 
However, given the greatly reduced proteome size and 


reliance of some togaviridae members on host enzymes 
to meet their replication needs, this process may be 
somewhat different from that seen in SARS and the 
other human coronaviruses. 

Conclusions 

To our knowledge, this study provides the first structural 
characterization of a highly specific phosphatase from 
an RNA virus. The experimental demonstration of phos¬ 
phatase activity on Appr-T'-p, combined with its struc¬ 
tural relationship with other known macro-fold members, 
strongly hints at the possibility that many “hypothetical” 
proteins of this superfamily might in fact be phospha¬ 
tases that act on similar substrates. The unique differ¬ 
ences between the active site of SARS ADRP and yeast 
Ymx7, both of which dephosphorylate the same sub¬ 
strate, imply that, while being structurally and function¬ 
ally homologous, they probably employ different cata¬ 
lytic mechanisms. Further studies are needed to fully 
explore the functional significance of this enzyme in the 
larger context of the membrane bound replicase com¬ 
plex and its regulation of translation and replication of 
viral RNA. If true, the functional link between SARS 
ADRP and other nsPs highlighted here could provide 
new avenues for investigation of the replication process 
of the virus in infected cells, with the hope of developing 
therapeutic agents aimed at inhibiting viral replication. 

Experimental Procedures 
Cloning, Expression, and Purification 

The sequence corresponding to residues 184-365 (182 aa) of SARS 
nonstructural protein nsP3 (gi:34555776; NP_828862) of poly-pro¬ 
tein ppla was amplified by polymerase chain reaction (PCR) from 
genomic cDNA of SARS-Tor2 strain with Taq polymerase and primer 
pairs encoding the predicted 5' and 3' ends (forward: 5'-CCAGTTAA 
TCAGTTTACTG GTTATTTAAAACTTACTGAC-3'; reverse: 5'-CTCCT 
CTTGTTTAGGTGCTTCC-3')- The PCR product was cloned into 
plasmid pMHIf, which encodes an expression and purification tag 
(MGSDKIHHHHHH) at the amino terminus. Protein expression was 
performed on a sequence-verified clone in native 2xYT or selenome¬ 
thionine (SeMET)-containing media by using the E. coli methionine 
auxotrophic strain DL41. Bacteria were lysed by sonication in lysis 
buffer (50 mM KP04 [pH 7.8], 300 mM NaCI, 10% glycerol, 5 mM im¬ 
idazole, two Roche EDTA-free protease inhibitor tablets) with 0.5 
mg/ml lysozyme. Cell debris was clarified by ultracentrifugation at 
45,000 rpm for 20 min (4°C), and the soluble fraction was applied 
onto a metal chelate column (Talon resin charged with cobalt; Clon- 
tech). The column was washed in 20 mM Tris (pH 7.8), 300 mM NaCI, 
10% glycerol, 5 mM imidazole and was eluted with 25 mM Tris (pH 

7.8) , 300 mM NaCI, 150 mM imidazole. The resultant protein was fur¬ 
ther purified by using anion exchange chromatography on a Poros 
HQ column with elution buffer containing 25 mM Tris (pH 8.0) and 
1 M NaCI. The pure fractions of the protein were pooled, and the 
buffer was exchanged into crystallization buffer (10 mM Tris [pH 

7.8] , 150 mM NaCI) and concentrated by centrifugal ultrafiltration. 
The final concentration of native and SeMET protein was 1.0 mM 
and 1.4 mM, respectively. The protein was either frozen in liquid ni¬ 
trogen for later use or used immediately for crystallization trials. 

Crystallization and Data Collection 

The protein was crystallized with the nanodroplet vapor diffusion 
method (Santarsiero et al., 2002) by using standard JCSG crystalli¬ 
zation protocols (Lesley et al., 2002). Thick, rectangular, rod-like 
crystals (~200 fim x ~100 jim x ~75 jim) appeared after 10 days 
in 0.4 (il drops containing 0.2 ]nl each of protein and crystallization 
well solution containing 1.5 M sodium malonate (pH 7.0). A higher 
concentration (1.8 M) of sodium malonate with 25% glycerol was 
used as cryoprotectant. A native 1.4 A dataset (at a wavelength 
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of 0.9794 A) was collected at beamline 11.1 of the Stanford Syn¬ 
chrotron Radiation Laboratory by using Blu-ICE (McPhillips et al., 
2002). Anomalous diffraction data were collected at the Advanced 
light Source (ALS, Berkeley, CA) on beamline 8.2.1 at a wavelength 
of 0.97941 A, corresponding to the peak wavelength of a selenium 
SAD experiment. Reflections were indexed in a primitive orthorhom¬ 
bic lattice (Space group P2-|2i2i), integrated, and scaled by using 
the HKL2000 suite (Otwinowski and Minor, 1997). 

Structure Determination and Refinement 

The initial phases were obtained by the single wavelength anoma¬ 
lous dispersion (SAD) phasing method with data collected to 2.2 A 
at the selenium peak wavelength by using the program SOLVE (Ter- 
williger and Berendzen, 1999). All 12 selenium sites were located, 
and the resulting phases had a figure of merit of 0.54 after density 
modification procedures by using RESOLVE (Terwilliger, 2003). 
The resultant phases from SAD were merged, improved, and ex- 

o 

tended for a native data set to 1.4 A by using the programs CAD 
and DM as implemented in the CCP4 package (CCP4,1994) assum¬ 
ing four monomers in the ASU with Matthews coefficient 2.6 and 
a solvent content of 51 % (Cowtan, 1994). Automated model building 
with Arp/wARP (ver 6.0; Lamzin and Wilson, 1997) traced ~80% of 
the backbone and docked 65% of the sequence, including the side 
chains. The rest of the sequence was manually built into the density 
with O (Jones et al., 1991) and was refined against the high-resolu¬ 
tion native data to 1.4 A with iterative rounds of model building and 
refinement by using Refmac5 (Murshudov et al., 1997) of CCP4. 
Although RESOLVE did identify the presence of NCS among the 
monomers, it was not used at any stage of refinement. A summary 
of data collection and refinement statistics is shown in Table 1. 
The stereochemical quality of the final refined model was checked 
with Procheck (Laskowski et al., 1993), and the dimer interface 
was calculated by using the protein-protein interaction server. The 
ribbon diagrams were made with Pymol (DeLano, 2002). 

Enzyme Assays 

The substrate Appr-1"-p was a kind gift from Prof. Phyzicky 
(Rochester Univ, USA) and was enzymatically prepared by reacting 
the precursor Appr>p with cyclic phosphodiesterase (CPD) by using 
procedures described in Shull et al. (2005). Phosphate release was 
monitored by the Enzchek assay (Molecular Probes Inc, Eugene 
OR, USA) by following the manufacturer’s instructions. The assay 
uses the method of Webb (1992), which monitors the release of 
inorganic phosphate by coupling the phosphatase reaction with 
the enzymatic conversion of 2-amino-6-mercapto-7-methyl-purine 
riboside (MESG) to 2 amino-6-mercapto-7-methyl purine and 
ribose-1-phosphate by purine nucleoside phosphorylase. The sub¬ 
strate MESG has an absorbance maximum of 330 nm, whereas 
that of the product is 360 nm. Each 1 ml reaction mixture contains 
50 mM Tris (pH 7.5), 1 mM MgCI 2 , 0.1 mM sodium azide, 200 jiM 
MESG, 1 U purine nucleoside phosphorylase, and 2.7 [xM enzyme. 
Increasing amounts of the substrate were added to the reaction 
mixture, and the ADRP reaction was monitored by changes in ab¬ 
sorbance at 360 nm in a UV spectrophotometer. To check for phos¬ 
phate contamination, appropriate control reactions were performed 
in the presence of enzyme, but with no substrate and vice versa. No 
measurable phosphate contamination was detected either from the 
enzyme preparation, substrate degradation, or from the buffers. The 
assay components were checked with known amounts of phos¬ 
phate standard supplied by the manufacturer. A molar extinction co¬ 
efficient of 11,000 M -1 cm -1 of the product of the PNP reaction at 
360 nm was used to quantitate the amount of released inorganic 
phosphate (Etzkorn et al., 1994). 
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